home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98c.txt
/
000016_icon-group-sender _Fri Sep 11 13:09:21 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id NAA07475
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 11 Sep 1998 13:09:15 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA32575; Fri, 11 Sep 1998 13:08:48 -0700
To: icon-group@optima.CS.Arizona.EDU
Date: 11 Sep 1998 11:18:47 -0700
From: Patrick Scheible <kkt@itchy.serv.net>
Message-Id: <iozpc65x88.fsf@itchy.serv.net>
Organization: ServNet Internet Services
Sender: icon-group-request@optima.CS.Arizona.EDU
References: <199809102056.IAA16557@atlas.otago.ac.nz>
Subject: Re: Unicode support or support for non-Ascii based character manipulation?
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Gordon Peterson (http://www.computek.net/public/gep2/) wrote:
> Okay, I don't dispute that this move is happening but personally I
> still don't very much like it. The fact is that (at least here in the
> Western Hemisphere, where probably most of the world's computers are
> used) an eight-bit byte is already quite sufficient for most purposes,
> and doubling it comes at a cost in complexity and storage (RAM, disk,
> tape, whatever) which is simply very, very hard to justify on any
> genuine economic basis.
ASCII is also NOT adequate for many purposes even in the United
States. Almost every word processor has their own incompatible way of
representing diacritical marks and characters that were omitted from
ASCII. (By the way, did you know that there are other countries in
the Western Hemisphere besides the United States? And most of them
don't speak English?) I work in a library, and libraries found plain
ASCII inadequate all the way back in the early 1960s, when the
computer programmers were still bitching about people who wanted
lowercase letters. (By the way, the character set libraries adopted
does a lot better job accomodating all the roman-alphabet languages
than the later ISO standards; pre-composed characters with diacritical
marks greatly expand the character set and still leave out some
combinations that occure in Roman-alphabet languages.)
There's borrowed words with diacritical marks, place names from
foreign languages, personal names, quotations from old English.
That's not even counting other Roman-alphabet languages.
> If other countries have more difficult (or huge) character sets,
> that is (while a fact of life) simply an inherent disadvantage
> of their culture (and note that I'm not intending that as a slam
> or value judgement, it just IS the way it is), and I don't see a
> terribly convincing argument why the other countries (without
> that disadvantage) ought to pay the price too, just in order to
> artificially level the playing field.
Many of those non-Roman character sets are no more difficult than
Roman. Cyrillic has enough letters to spell the major sounds in its
languages, which you've got to admit is a plus. Greek, Hebrew,
Arabic, and numerous other alphabets are no harder in themselves than
the Roman.
Part of what made them a pain to program was that most of the industry
and national standards organizations all took it on themselves to make
their own 8-bit encodings, so you had to look outside the character
string to interpret the bytes in it. Even if you skip the Han
character set parts of Unicode, Unicode is a huge blessing in that all
the other alphabets have code points within Unicode.
The United States is not an island. Closing our eyes and pretending
that rest of the world doesn't exist and doesn't buy our software
would be a bad idea even if it was possible.
If you're concerned about efficiency, maybe you should worry about all
the gratuitous graphics. Over uncompressed ASCII, compressed Unicode
uses little to no more disk or tape space. Compressing and
uncompressing strings adds some complexity, but you get some
simplicity by not having to keep track of which character set you're
in and switching back and forth between character sets within what is
logically one string.
-- Patrick Scheible